* JUL-19-06 12:04PM FROW-Fenwick & West Mountain View 650 938 5200 T-744 P. 004/012 F-890 

RECEIVED 
CENTRAL FAX CENTER 

JUL 1 9 2006 

AMENDMENTS TO THE CLAIMS 
Please cancel claims 1, 13, and 17 and amend claims 2-5, 7-8, 12, 14-16, and 18-24 as 
follows: 

1. (Canceled). 

2. (Currently Amended) ¥be mnttinri nf nlaim 1. further comprising: A method for 
information ext raction, comprising: 

accessing a plurality of related articles: 
determining a seed article from the related articles; 

identifying at least one information field within the seed ar ticle bv comparing the seed 

article to at least one other related article: and 
determining a label for th e information field; and 

asoooiating a - pointer to a location of the information Sold in tho oood articl e to creat e 
creating a template based on the identified information field . 

3. (Currently Amended) The method of claim [[1]] 2, wherein comparing the seed article to 
at least one other related article is performed by a dynamic programming alignment algorithm to 
determine an alignment between the seed article and the related article. 

4. (Currently Amended) The method of claim [[1]] 2, further comprising determining a 
cluster of related articles from the related articles. 

5. (Currently Amended) The method of claim 4, wherein determining [[a]] the cluster of 
related articles is performed byi 

using a dynamic programming alignment algorithm to compute edit distances between the 

seed article and-all-^f the related articles; and 
choosing the cluster of related articles based on the edit distances. 
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6. (Original) The method of claim 4, wherein the identifying at least one information 
field within the seed article is performed by comparing the seed article to the cluster of articles. 

7. (Currently Amended) The method of claim [[1]] 2, wherein the information field 
corresponds to variable data, 

8. (Currently Amended) The method of claim [[1]) 2, wherein the articles are web pages. 

9. (Original) The method of claim 8, wherein the related articles are web pages on a 
web site. 

10- (Original) The method of claim 9, further comprising simplifying the content on a 
web page. 

1 1 , (Original) The method of claim 10, wherein simplifying the content includes 
preserving visible text, visible images, and visible paragraph and table formatting. 

12, (Currently Amended) The method of claim 2, further comprising: 
identifying a plurality of templates each comprising at least one information field; 
comparing a source article to the templates to determine &ea closest template; 
associating data from the source article with an information field from the closest 

template; and 
extracting the associated data. 

13. (Canceled.) 

1 4. (Currently Amended) The mothod of claim 13 Author comprian g- A method of 
extracting data from a source article, comprising: 

identifying a plurality of templates each com prising at least one information field; 
comparing the source article to the templates to determine a cl osest template: 
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associating data from the source article with an information fi eld from the closest 

template: and 
extracting the associated data. 

1 5 . (Currently Amended) The method of claim 4314, wherein comparing the source article to 
the templates is performed by a dynamic programming alignment algorithm to compute an edit 
distance between the source article and the templates. 

1 6. (Currently Amended) The method of claim 43-14, wherein the source article is a web 
page, 

17. (Canceled). 

18. (Currently Amended) The computer readable medium of oloim 17, furth e r oomp risiagf 
A computer program prod uct for information extraction, comprising; 

a computer-readab le medium; and 

computer pxog^ rn ^ode 1 encoded on t he medium, for: 

accessing a plurality of related articles; 

determining a seed article from the related articles; 

identifying at least one information field within the seed article bv comparing the, 

seed article to at least one other related article: and 
pregFa m codo for d e t e rmining a label for tho information fiold; and 
pf&gfam cod e for acoooioting a pointer to - a - leoation of the information field in the 

e ood articl e to cr e a te 
creating a template based on the identified information field. 

19. (Currently Amended) The computer program product roadablo medium of claim WH, 
wherein comparing the seed article to at least one other related article is performed by a dynamic 
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programming alignmenl algorithm to determine an alignment between the seed article and the 
related article. 

20. (Currently Amended) The computer pro pram product readable medium of claim 4^18, 
further comprising computer program code for determining a cluster of related articles from the 
related articles. 

2 1 . (Currently Amended) The computer program product roadablo modium of claim 20, 
wherein determining a cluster of related articles is performed byi 

using a dynamic progr ammin g alignment algorithm to compute edit distances between the 

seed article an d all of the related articles; and 
choosing the cluster of related articles based on the edit distances. 

22. (Currently Amended) The computer program product readable median of claim 20, 
wherein the identifying at least one information field within the seed article is performed by 
comparing the seed article to the cluster of related articles. 

23. (Currently Amended) The computer pi y gram product roadablo modium of claim 18, 
further comprising computer program code for : 

program oode for identifying a plurality of templates each comprising at least one 
information field; 

program code for comparing a source article to the templates to determine feea closest 
template; 

program code for associating data from the source article with an information field from 
the closest template; and 

program oodo for extracting the associated data. 
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24, (Currently Amended) The computer program product roadablo modium of claim 23 , 
wherein comparing the source article to the templates is performed by a dynamic programming 
alignment algorithm to compute an edit distance between the source article and the templates. 
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